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INTRODUCTION 

It  must  be  with  some  trepidation  that  one  ventures  to  speak  about  the 
problems  of  linear  estimation  to  an  audience  already  well  familiar  with  the 
overwhelmingly  more  difficult  nonlinear  filtering  problem  However,  perhaps  to 
compensate  for  this  spectacle,  the  organizers  have  given  me  the  opportunity  to 
speak  first,  with  considerable  latitude  in  the  choice  of  my  topics. 

For  such  an  audience,  there  will  be  no  need  to  present  a  tutorial  on  linear 
filtering,  especially  of  the  Kalman-Bucy  type.  1  chose,  therefore,  to  focus  on 
some  aspects  generally  less  familiar  to  those  with  a  'modern'  control  theory 
background,  i.e  ,  largely  a  state-space  background.  In  particular,  we  shall  begin 
with  a  discussion  of  integral  equations  and  of  the  important  Wiener-Hopf  tech¬ 
nique.  We  shall  specialize  this  to  stationBU"y  processes  over  infinite  intervals,  and 
then  describe  some  alternative,  often  computationally  better,  solution  methods 
of  Ambartzumian-Chandrasekhar  and  Krein-Levinson  for  finite-interval  prob¬ 
lems.  For  nonstationary  processes,  we  start  first  with  state-space  models  Buid  \ 

build  up  to  a  brief  description  of  the  scattering  theory  framework  for  linear  esti-  ^ 

mation.  This  will  then  lead  us  to  nonstationary  versions  of  the  Ambartzumian- 

^  This  work  wos  supported  in  part  by  the  Air  Force  Office  of  Scientific  Research,  Air  Force 
Systems  Command,  under  Contract  AF4O-aZO-79-C-0Ose;  by  the  U.S.  Army  Command  Office 
Contract  DAAG29*79-C'0215;  and  by  the  National  Science  Foundation,  under  Grant 
ENG76-10003. 
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Chandrasekhar  and  Krein-Levinson  equations,  which  we  shall  only  consider  very 
briefly,  providing  detailed  references  for  further  reading.  We  shall  conclude  with 
a  remark  on  a  possible  implication  for  nonlinear  filtering. 

Our  presentation  is  confined  to  continuous-time  processes:  a  recent  survey 
of  the  discrete-time  case  can  be  found  in  Kailath  (1960). 

In  writing  this  chapter,  it  was  a  great  help  to  have  a  carefully  prepared  prel- 
imineu-y  reduction  of  the  actual  lectures,  contributed  by  B.  Hanzon,  B.  Ursin  and 
D.  Ocone.  It  is  a  pleasure  also  to  thank  M.  Hazewinkel  for  these  and  several 
other  organizational  touches  that  made  for  an  outstanding  symposium. 
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1.  The  Integral  Equations  of  Smoothing  and  Altering 

Our  estimation  problems  will  be  discussed  in  the  context  of  the  following 
model  for  the  observed  random  process  (i/i  j: 

yt  =  Zf  +  vt  .  0  ^  t  &  T  (1) 

where  and  are  -valued  stochastic  variables  with  mean  zero,  emd  such 
that  ^ 

E  ViV^-.=-Ip6{t-s)  , 

where  6  is  the  Dirac-delta  distribution,  emd 

E{ztz',  +  +  VfZ,)  :=  A’tf ,s)  . 

a  continuous  function  on  [0, 7’]x[0,r].  This  model  describes  a  situation  in  which 
a  signal,  Z(,  can  only  be  observed  with  additive  white  noise  v,.  We  note  that 
E(t ,s)  does  not  have  to  be  a  covariance  function,  it  is  necessary  only  that 

/f(f.s)  ;=  E  yivi  =  Ip  6(t  -s)  +  K(t  ,s) 
be  a  covariance  function. 

It  will  be  useful  to  consider  two  special  cases: 

(i)  1/4-*.  for  all  s,f  with  O^s^T,  Cyst<T.  In  this  case  K[t,s)=E  Z|Z,  is  a 
covsu’iance  function 

(ii)  V|4-*.  for  all  Os  This  possibility  allows  causal  dependence  of  z  on  i/ 
(feedbacki). 

la.  The  Smoothing  Problem:  Fredholm  Equationa 

The  smoothing  problem  is  as  follows.  Given  the  observations  ji/,  :  O^s^Tj, 

find 

^  No  specia!  notation  will  be  uaed  to  indicate  matrix  on  vector  quantities.  Pnmes  will  denote 
transposes  and  E  denotes  expectation. 
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r 

*t\T  =  f  H(t.s)y,ds  (2a) 

0 

such  that  ^(z^-z  ,|}.)'(2|-2  jir)  is  minimum,  where  the  minimization  is  taken 
over  all  matrix-valued  functions  H(t,s)  of  s.  with  t  fixed,  in  the  Hilbert  space 
Z,*[0,7’].  It  is  well  knovm  that  the  following  holds. 

Theorem:  A  necessary  and  sufficient  condition  for  the  solution  of  this  smoothing 
problem  is 

£■(2,  -2j|r)l/i  ==  0  tor  all  s  e[0,7’]  .  (2b) 

Put  differently:  every  component  of  2, -2,|j.  must  be  orthogonal  to  every  com¬ 
ponent  y,,  for  edl  s  £[0,7’],  where  the  orthogonality  is  induced  by  the  inner 
product  (a,6):  =  £'  ab ,  a  and  b  scalau"  stochastic  vau'iables 

Suppose  now  further  that  the  special  case  (i)  holds,  namely  that  Zj  is 
orthogonal  to  v,  for  alt  s,<  with  OiSS'ST.  Ost-eT.  Then  the  condition  (2b)  leads 
to  the  equation 

T 

f  h(,t.T)K(r.s)dr==  K(t.T).0^s.t-6T  .  (3) 

0 

This  is  a  Fredholm  equation  of  the  second  kind  (see,  e  g  ,  Courant  and  Hilbert, 
Vol.  I,  Ch.  Ill)  and  the  solution  H(t,s)  is  called  the  Fredholm  resolvent  of 
Kft.s)  on  [0,r]x[0.7'].  Introduce  the  following  operator  notation: 

r 

HK  is  defined  as  {HK){t,s)  ~  f  H[t  .t)K{t,s)  dr  , 

0 

and  I  is  the  identity  operator  with  kernel  /(f.s)  =  Ip  d(f-s).  In  this  notation 
the  equation  (3)  can  be  written  as 

H  +  HK  =  K  (4a) 


or  in  the  equivalent  forms 
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(/  -  H){t  +  K)  =  I  -{I  i-  K)U  -  H)  .  (4b) 

Clearly  I-H  is  the  inverse,  in  the  sense  of  the  "operator  multiplication"  that  we 
have  just  defined,  of  1+K  .  Note  that  in  this  case  of  complete  orthogonality  of 
Zi  and  1/,,  the  smoothing  filter  is  precisely  the  resolvent  of  K. 

How  can  the  resolvent  be  computed?  One  answer  is  provided  by  the  so- 
ceJled  Mercer  expansion  of  K{1  ,s)  (see,  eg,  Riesz  and  Nagy,  p  34&;  we  use  a:i 
extension  to  the  vector  case): 

=  S  A4((ii(f)50t(s)  (5a) 

isl 

where  the  are  vector-valued  orthonormal  eigenfunctions  of  the  operator  K 
with  eigenvalue 

T 

f  K{t .T)ifii(T)  dr  =  Xi<pi{l)  .  i  =  . 0<<<7'  (5b) 

0 

Then  it  can  be  seen  easily  that  the  Fredholm  resolvent  of  K  can  be  written  as 

//(t.s)  =  1:  (Ai/l +  \i)»>i(f)(P<(s)  (6) 

I 

lb.  The  niteriog  Problem:  WienerHo)^  Equations 

In  the  special  case  that  T=t,  the  smoothing  problem  becomes  what  is 
known  as  a  filtering  problem..  We  shall  assume  further  that  we  are  in  one  of  the 
special  cases  (i)  and  (ii),  viz.,  that  v^  is  either  orthogonal  to  z,  for  all  s  ,t  or 
just  for  all  s  <t .  The  filtered  estimate  can  be  written 

t 

*  «l«  = /*U.s)V>  *  ('i') 

0 

where  h{t,s)  satisfies 

I 

/i(l  ,s)  +  f  h.{t  .t)K{t,s)  dr  -  K{t  ,s),  Oss^<<r  , 


(8) 
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Note  that  (or  each  fixed  t ,  we  have  a  smoothing  problem.  The  point  is  now 
that  we  have  a  collection  of  Fredholm  integreil  equations,  one  for  each  vedue  of 
t.  and  unlike  as  in  (3)  is  more  than  just  an  indexing  parameter  in  the  family  of 
equations.  The  filtering  integral  equation  is  said  to  be  of  "Wiener-Hopf  type" 
rather  than  of  "Fredholm  type"  and  the  solution  can  not  be  as  simply  expressed 
in  terms  of  Mercer  expansions  as  in  the  smoothing  problem. 

In  one  sense  then,  smoothing  appears  to  be  "easier"  than  filtering,  a  state¬ 
ment  counter  to  the  intuition  current  in  the  Kalmem-Bucy  state-space  theory 
(see,  e.g..  Meditch  (1969)  and  also  the  discussion  following  (15b)  below).  How¬ 
ever,  the  following  facts  give  some  justification  to  this  claim; 

1.  In  the  Wiener  theory  of  estimation  of  stationary  processes  over  infinite 
(smoothing)  or  semi-infinite  (filtering)  intervals,  the  smoothing  solution  is 
readily  determined  by  Fourier  transformation,  while  the  filtering  solution 
requires  the  more  difficult  Wiener-Hopf  technique  (further  elaborated 
below) 

2  In  estimation  given  a  fixed  time-interval,  smoothing  can  be  implemented 
with  time  invariant  filters  (convolutions  or  fast  Fourier  transforms,  see 
Levy,  Kailath,  Ljung  and  Morf  (1979)).  while  this  will  never  be  true  for  filter¬ 
ing. 

Ic.  The  Generalized  Wiener-Hopf  Technique 

Wiener-Hopf  equations  first  appeared  in  astrophysics  and  radiative  transfer 
theory  around  1900  In  1931,  Wiener  and  Hopf  invented  an  ingenious  method  for 
solving  the  equation,  and  it  has  since  borne  their  name.  Their  so-called  Wiener- 
Hopf  technique  plays  a  central  role  in  linear  filtering  theory,  and  we  will  present 
it  briefly  here,  both  as  a  framework  for  later  discussion  and  as  a  service  to  our 
state-space  friends  who  mignt  conceivably  have  never  seen  it!  The  technique 
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was  originally  developed  for  diflerence  (or  convolution)  kernels  R,  here  we 
■  i  scribe  a  generalized  form  (for  arbitrary  kernels)  that  captures  the  main  idea 
To  focus  on  the  main  idea,  the  treatment  will  leave  aside  technical  issues 
(hypotheses  on  kernel  functions,  specification  of  function  spaces,  etc.)  that  are 
needed  to  build  a  rigorous  theory  (see.  for  example,  Devinatz  and  Shinbrot 
(1967)  and  Gohberg  and  Feldman  (1974)).  To  describe  the  technique,  we  first 
develop  an  operator  notation  for  (8)  that  expresses  the  constraint  s<f.  Thus  if 
L  is  an  integral  operator  associated  with  the  kernel  L{t,s),  define 

with 


m.s)u 


L(t,s)  if  s  ^  t 
0  if  s  >  f 


Accordingly,  define  (/!».;=/.  \f.\y  is  called  the  caitsof  part  of  L.  With  this  nota¬ 
tion  the  Wiener-Hopf  equation  (8)  becomes 


=  {A-j*  .  (9) 

As  only  the  values  of  h{t.s)  for  s<f  play  a  role  in  this  problem,  h{t  ,s)  can  be 
(and  will  be)  taken  equal  to  its  causal  peu-t;  A  =  (Aj*.  We  assume  that 
R=I^K.  R-R' ,  R  is  positive  definite  as  a  kernel,  and  K  does  not  contain  I 
term  (alternatively:  does  not  appear  in  A’(f,s)) 

The  key  idea  of  the  method  of  Wiener  and  Hopf  (1931)  is  to  assume  that  R 
can  be  suitably  factored.  In  our  case,  as 

R  =  R^'^R''^  , 


where  is  a  causai  and  causally  invertible  operator,  that  is, 

and  R~'^^  ={R'^^]~'  exists  euid  satisfies  /?"'''*=}/?'■'*(,.  Here  /?'•'*  denotes 

the  adjoint  of  that  is  ,s)=77*4®(s.f )  Such  an  /?•''*  is  called  the 
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canonical  factor  of  R,  and  when  it  exists  it  will  be  unique  as  a  consequence  of 
the  causal  and  causally  invertible  requirement.  [Observe  that,  despite  the  nota¬ 
tion.  is  not  the  traditional  operator-theoretic  (symmetric)  square  root  of 

a  positive  operator.] 

Now  make  the  simple  but  crucial  observation  that  if  h  solves  (9).  there 
must  exist  some  function  g  such  that 

( ^  =  0  and  hR=K  +  g  (10) 

Here  g  is  strictly  anti-causal,  i.e  .  it  does  not  have  any  /  component  Multiply¬ 
ing  (10)  on  the  right  by  we  have 

hR'^^  =  KR-'^^  i  gR-''^  .  (11) 

Now  apply  J  to  both  sides  of  (11).  Since  h  is  causal  and  R'^^  is  causal  and 
since  the  composition  of  two  causal  operators  is  again  causal, 

Likewise  the  composition  of  a  strictly  anti-causal  operator  {g)  with  an  anli- 
causal  operator  (/?“*^*)  is  strictly  anti-causal;  (/?  *''®=(/?  '''^)'  is  anti-causal 
since  is  causal)  Thus  The  end  result  is  hR'^^=■\KR  '^^\^, 

or 

h  =\KR  ’'^\,R~''^  .  (12) 

This  is  the  solution  of  the  Wiener-Hopt  equation. 

However,  we  have  not  really  so  far  used  the  assumption  that  R  has  the  spe¬ 
cial  form  R  =  l  +  K  For  this  case  further  important  results  are  available  from 
canonical  factorization  First  observe  that,  since  K  =  R-I, 

h  =  \{R  -  /)/?  '■'2 

=  \R''^  -  R 


(13) 
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However,  when  R  =  /  +  K,K^^^  must  have  the  form  K'^^=/+h,  where  h.  is  some 
strictly  causal  operator,  and  so  also  must  have  the  form  R  '^^=I+l. 

where  i  Is  strictly  causal  Therefore,  =  =  since!*  is  strictly 

anti-causal  Hence  (13)  reduces  to 

h  =  /  -  R  (14) 

This  striking  formula  has  several  interesting  implications 

First,  and  most  important,  it  shows  that  for  this  problem  canonical  factori¬ 
zation  and  ftltering  are  equivalent  problems;  h  is  immediately  determined  if 
R  ^''^  IS  known  and  vice  versa. 

Secondly,  if  R  is  applied  as  a  filter  to  y .  we  have 
7?-i/2y  =  (/  -  h)y  =  y  -  £ 

Since  y=zt-v.z  is  the  estimate  of  given  iy, ,  s <f  j ,  so  that  it  is  reasonable 
to  expect  that  t-he  new  information  or  innovation  process,  is  a  while 

noise  process,  consistent  with  the  calculation 

<R-'^’‘y,R-'''^y>  =  R-^'^<y .y>R-'‘'^  =  R  '^'^RR  =  1  . 

This  result  can  be  rigorously  established  under  quite  general  conditions  (see 
Kailalh  (1968),  Kailath  (1971),  Meyer  (1973)), 

Id.  A  Resolvent  Identity  relating  Smoothing  amd  mtering 

Recall  the  Fredholm  resolvent  of  K,  defined  by  I -H -{!  +  K)  '=/?  '.  By  vir¬ 
tue  of  (14).  we  can  write 

r  -  H  =  R-'^^R  =  U  -  A*)(/  -h)  (15a) 

which  immediately  yields  the  nice  formula 

H  =  h’  +  h  -  h'h  (15b) 

This  is  actually  an  old  identity,  known  by  the  early  1950's  when  it  was 
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discovered,  in  a  differential  version,  independently  by  Siegert,  Krein,  Bellman 
emd  others  (see  references  in  Kailath  (1974)). 

Now  when  the  signal  and  noise  are  completely  uncorrelaled,  we  saw  earlier 
(cf.  (4b))  that  the  Fredholm  resolvent  H  is  just  the  smoothing  filter,  the  iden¬ 
tity  (15)  then  shows  that  the  causal  filter  h  determines  the  smoothing  filler 
This  seems  to  contradict  the  remarks  we  made  in  Section  8b  about  the  relative 
difficulties  of  smoothing  and  filtering  as  they  appeared  from  thinking  of  the 
Wiener-Hopf  equation  as  an  infinite  family  of  Fredholm  equations  In  the 
approach  via  canonical  factorization,  it  would  appear  that  filtering  comes  first, 
emd  then  smoothing  In  Sec.  3d,  which  describes  a  scattering  theory  approach, 
this  sequence  will  again  be  reversed. 

We  shall  illustrate  these  different  relationships  between  filtering  and 
smoothing  by  considering  several  specific  examples  in  the  next  two  sections 
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2.  Some  fbcamples  -  Slationary  Processes 


■2a.  Scalar  Slationary  Proccses  Over  Infinite  Intervals 

These  problems  were  studied  by  Wiener.  Kolmogorov  and  Krein  Kor  a  con¬ 
cise  exposition  of  Wiener's  work,  see  a  paper  by  N  I.evinson  (19-17),  reprinted  as 
Appendix  C  of  Wiener  ('.949)  The  papers  of  Kolmogorov  and  Krein  arc  repri:  ted 
in  Kailath  (1977) 

We  suppose  v,  to  be  scalar  and  stationary,  then  R[t  =  -s) 

We  assume  the  existence  of 


the  Fourier  transform  of  R(l).  where  j  is  the  imaginary  unit  Sy{cj)  is  nonne- 
gative  (for  real  cv)  and  is  known  as  the  power  spectral  density  of  the  process  y 
We  assume  further  that 


7 


I  In  Sy{u)  I 
1  + 


du  < 


ec 


(16) 


If  this  is  not  the  case,  then  Kolmogorov  and  Wiener  showed  that  j/j  can  be 
predicted  perfectly  from  its  own  past  (see,  e  g  ,  Doob  (1953).  Ch  13) 

Under  this  assumption  the  canonical  factorization  of  R[l)  over  (-'”,'«) 
corresponds  to  a  factorization  of  3y(a>)  as 

Sy{u)  =  ^y{'J)Sy  [cS)  .  (17) 

where  Sy{a+ju)  is  analytic  and  bounded  in  the  right  half  plane  a>0,  Sy  (i.)  is 
analytic  in  the  left  half  plane  and 

Sy  (a  +  ju)  =  Sy{o  -  j'J)  ( ■  7a) 

It  can  be  shown  that  (see,  eg,  Solodovnikov  (I960)) 
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5;(u)  =  ,  (18) 

where  iJ(u)  is  the  Hilbert  transform  of  InVSy^y 

In  the  case  that  5y(cj+iff)  is  rational.  Sy(o+ia)  can  be  found  as  follows: 

Sj*(u  +  ja)  =  constant  x  Monic  polynomial  of  left  half  plane  zeros)  x  (19) 
(Monic  polynomial  of  aU  left  half  plane  poles)''. 

Tliis  follows  immediately  from  the  fact  that  Sy{a+jtj)  and  \/  Sy{a+j  u)  must 
be  analytic  in  the  upper  half  plane  (r>0.  and  from  (17a)  Note  that  (17a) 
Implies  S,'(u)=^c.0  =  S^*(-o),  because  R(t)  is  real  Therefore, 

Now  the  canonical  causal  factor  (see  the  text  between  (9)  and  (10))  can  be 
found  as 

=  F  'lSy*(u)\  .  (20) 

the  inverse  Fourier  transform  of  Sy{u)  The  optimal  filter  (see  (14))  is  then 
equal  to 


How  About  Smoothing? 

Consider  the  Fourier  transform  of  the  smoothing  filler  H  (see  (15)): 

Flfii  =  Fih  +  h’ -  h'hi  =  (22) 


This  is  a  well  known  formul.i  easy  lo  derive  directly  from  the  equality 
H  =  I-R  '  Note  that  smoothing  docs  not  require  special  factori/ati.m,  so  that 
the  smoothing  solution  is  easier  to  find  from  the  given  data  than  the  filtering 


solution. 
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Important  Remark:  When  Sy(o)  is  rational,  Sy{u)  is  rational  and  so  is 

F\h]  =  1 - }  This  can  be  readily  implemented  in  (a  variety  of)  state- 

Sy*(cS) 

space  forms,  so  that  *  can  be  "recursively  computed”,  as  noted  by  Whittle 
((1963),  p.  35)  and  others,  independently  of  the  direct  state-space  formulation  of 
Kalman  and  Bucy  (1961). 


2b.  Finite  Intervals  -  The  Ambartzumian-Chandrasekhar  Equations 

We  shall  next  talk  about  the  more  difficult  case  of  filtering  stationary, 
scalar  processes  defined  on  finite  intervals  This  may  be  considered  the  first 
natural  extension  of  Wiener's  work  and  it  began  to  be  studied  in  the  engineering 
iiterature  around  1950,  shortly  after  Wiener's  work  became  public  (see,  e  g  , 
Zadeh  and  Ragazzini  (1950))  The  finite  interval  case  presented  a  new  challenge 
because  spectral  factorization,  in  its  traditional  sense,  does  not  work,  and  thus 
researchers  tried  various  other  methods  to  find  the  solution  (see,  e  g.,  Solodov- 
nikov(1963)) 

However,  the  astrophysicists  V  A  Ambartzumian  (USSR)  and  S  Chan¬ 
drasekhar  (USA)  had  already  studied  such  problems  in  the  mid-194D's,  indepen¬ 
dently  of  engineers,  and  had  demonstrated  that  the  Wiener-Hopf  equation  could 
be  replaced  by  an  equivalent  Riccati  equation  Their  results  greatly  simplified 
the  numerical  computation  of  solutions,  and  since  computation  m  those  days 
meant  calculation  by  hand,  they  were  considered  to  be  a  great  success  The 
Ambartzumian-Chandrasekhar  theory  (see  Chandrasekhar  (1950))  assumes  that 
the  kernel  of  the  Wiener-Hopf  equation  has  of  the  form 


K{t  -s) 


*'ii;(a)  da 


(S'!) 


in  which  w(,a)  is  some  known  function  This  form  arose  from  the  physical  situa¬ 
tion  they  considered,  in  which  light  incident  at  an  angle  a  is  propagating 
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through  a  medium.  If  we  assume  that  the  light  is  incident  at  a  finite  number  of 
values  a,  so  that  to  (a)  is  a  sum  of  d-functions,  the  process  y  wll  have  a 
rational  spectral  density 


Sy{u)  =1  +  2 

isl 


a*  +  o* 


(24) 


The  first  result  of  the  Ambartzumian-Chandrasekhar  theory  is  that  to  solve 
the  Wiener-Hopf  equation  it  suffices  to  find  the  solution  Q{t  .a,/S)  to  a  Riccati- 
type  partial  differential  equation 
-  I 

!-<?(<. a, (?)  =  P+k  Qit.a.p)  +  /  Q(t.o.0)w(p)  d0  (25) 

01  Q 

1  I  I 

+  f  Q{t  .a  .fi)ih{a)  da  +  f  f  Q{t  .a. $)Q[t  d ,p)w{d)w[^’)  dad0 
0  0  0 

Q  can  be  computed  by  discretization  of  this  equation  to  obtain  a  finite  dimen¬ 
sional  system  of  ordinary  Ricatti  differential  equations  This  has  great  computa¬ 
tional  advantages,  though  since  it  may  be  required  to  compute  9(f  P.P)  for  t ,  a 
and  0  ranging  over  a  large  set  of  values,  this  is  still  burdensome. 

However  using  certain  physical  invariance  arguments,  Ambarlzumian 
(1943)  was  able  to  show  that  could  actually  be  computed  in  terms  of  two  func¬ 
tions  X(,l  ,y)  and  Y(t,y)  of  tux)  rather  than  three  variables  Then  Chan¬ 
drasekhar  (' 947)  derived  a  pair  of  differential  equations  for  X  and  K,  consider¬ 
ably  simpler  than  the  original  Kii  call  equation  (25) 

-  >'('-^)/  nt  .0)^{0)  dp  {26a) 

=  -  Yit  y)  -  X{t  .y)  f  Y[t  .0)vu{0)  d0  (86b) 

with 


XfO.y)  =  T(O.y)  =  I  .  0<  ><  1 
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Astrophysicists  were  quick  to  recognize  the  value  of  the  recursive  solution  of  the 
equations  (26)  (see,  e  g  ,  Sobolev  (1965),  p  79)  The  ideas  of  Ambartzumian  and 
Chandrasekhar  were  brought  to  the  attention  of  applied  mathematicians  by  the 
extensive  work  of  Bellman,  Kalaba  and  their  colleagues  on  what  they  called 
invariant  imbedding  (see.  e  g..  Bellman  and  W\ng  (1976)).  The  equations  (26) 
were  first  introduced  into  the  estimation  literature  by  Casti,  Kalaba  and  Murthy 
(1972).  Their  extension  to  nonstationary  processes  was  made  bv  KailaLh  (1973) 

Zc.  Sobolev's  Identity  and  The  Krein-Levinson  Equations 

Fundamental  work  on  the  finite  interval,  stationary  case  did  not  end  with 
Ambartzumian  and  Chandrasekhar.  In  particular  Sobolev  (1965)  went  on  to 
address  the  problem  of  arbitrary  K(t~s).  for  which  a  representation  such  as 
(23)  is  not  given,  and  he  succeeded  in  developing  a  much  more  direct  approach 
His  idea  was  to  exploit  the  Toeplitz  structure  of  K{t  ,s)=K{t-s)  more  deeply 
them  in  the  previous  theory  By  this  approach,  he  established  a  very  powerful 
constraint  on  the  FVedholm  resolvent  (smoothing  kernel)  H(l.s.T)  of  a  Toeplitz 
kernel 

If  A[T;t)  is  defined  via  the  equation 

T 

A(T.t)  +  /  A{T.u)K{u  -  t)  du  ~  K{T  -  t)  .  0<t  sT  (27) 

0 

and  B{T.t)  via 

T 

B(T.t)  +  /  B[T:u)K{u  -  t)  du  =  K{-t)  ,  0<  f  «  T  (20) 

0 

then  ^5obolev  showed  that 

^|//(f.s;7-)  =  A[T.t)A{T.s)  -  B{r:t)B[T.s)  (29) 


with 
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A(T:t)  =  HiT.t.T)  =  H'U.T.T)  (30a) 

and 

B(T:t)  =  HiO.t.T)  =  H'{t.0\T)  (30b) 

(Note  that  these  equations  are  written  for  the  general  case  of  matrices  K,  H ,  A 
and  B.)  Sobolev's  identity  shows  that  the  resolvent  B(s.t  ,T)  is  determined  for 
(f.s)  c  [0,  rjxfo,  r)  by  its  values  on  the  boundaries  of  [0,  r]x[0, 7"],  i  e  ,  by  two 
functions  of  one  variable. 

Sobolev's  identity  is  even  more  striking  in  its  integrated  form,  which,  when 
translated  into  operator  form,  is 

/ -/f  =  (/  -a*)(/ -a) -b*6  (31) 

where  o  amd  b  are  not  only  causal,  but  also  Toeplitz,  This  means,  for  exam¬ 
ple,  that  the  filter  determind  by  a. 

( 

(“y)i  =  /  “(<  -s)y. 

0 

is  time^vanant  Equation  (31)  is  a  useful  modification  of  the  formula  (15a) 
(/-H)=(/ -fi*)(/ -fi).  because  it  expresses  H  only  in  terms  of  time  invariant 
(causal  and  anticausal)  operators,  whereas  h.  even  for  Toeplitz  ff.  is  not  gen¬ 
erally  time  invariant  Since  time  invariant  filters  are  much  easier  to  implement 
than  time-variant  ones,  it  is  reasonable  to  use  o  instead  of  h.  whenever  h 
appears  Of  course,  an  error  is  then  incurred,  but  the  remarkable  implication  of 
(31)  IS  that  if  o  replaces  h  in  (15a).  the  correction  can  again  be  made  with  a 
time  invariant  filter  b  Explicit  formulas  for  o  and  b  are  as  follows 

a[t)  =  A(T:t)  (32a) 


6(<)  =  B{T;T  -  t) 


(32b) 
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For  more  on  the  applications  of  this  identity  to  smoothing  and  other  prob¬ 
lems,  see  Levy  et  al  (1979)  and  Kailath  et  al  (1978) 

Sobolev's  Identity  also  has  important  implications  for  fast  computation  of 
the  resolvent  kernel  HH.a.T)  because  we  need  only  develop  an  efTicicnt. 
recursive  method  for  updating  the  boundary  values  A[T,t)  and  B{T,t)  of 
H(t,s',T)  In  fact,  Krein  (1955)  had  obtained  such  equations,  which  wc  shall 
present  here  in  the  special  case  of  scalar  processes: 

^jf+j^y{T.s)  =  -A(T,T-s)A{T.O)  (36) 

To  see  what  this  means,  consider  the  nanc  discreti/'ation,  T=N  A  and 

A{T  +  A;tA  +  A)  =  /K^iA)  -  /l(7’:  T  -  t  A)/l ( r,0)A  (36) 

which  propagates  as  illustrated  in  the  figure 


The  one  point  not  picked  up  by  this  scheme  is  y4(7'  +  A,0)  and  so  it  is  com¬ 
puted  by  using  the  integral  equation  (27): 


T*6 

A{T  ,  u;0)  =  f  A{T  +  A;u)Af(ii)  rfu  +  K{T  +  A) 
0 

A(T  A,i:A)/r(tA)A  +  A'(r  +  A) 

i»l 


(37) 
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How  fast  is  this  algorithm?  If  T=N-^,  it  takes  A/  +  1  multiplications  to  go  from 
T  to  r+A.  Therefore,  to  generate  the  boundary  out  to  T,  takes 
1+2+ ..+yv=Af(,W-El)/2=0(A^®)  multiplications.  Without  the  Toeplitz  structure, 
we  would  need  0(iV®)  operations  to  compute  H{t  ,s  -,T) 

The  above  method  of  solving  (36)  via  discretizat.on  leads  to  recursions  very 
simileir  to  those  introduced  for  the  prediction  of  discrete-time  processes  by 
Levinson  (1947)  and  since  then  widely  studied  in  the  signal  processing  literature. 
We  therefore  call  (36)  a  Krein-Levinson  equation  Kailath,  l.jung  and  Morf  (1978) 
have  extended  these  techniques  to  nonstationary  processes. 

We  should  also  mention  that  when  a  representation  of  K{t -s)  in  the 
exponential  form  (23)  is  available,  the  Krein-Levinson  equations  can  in  fact  be 
reduced  to  the  Ambartzumian-Chandrasekhar  equations  (cf  Kailath,  Ljung,  Morf 
(1976)) 


I 


1 
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3.  Some  Kicamples  -  Nonstationary  Processes 

3a.  State  S^ce  Process  Models  -  Kalman-Bucy  niters 

The  most  common  approach  to  nonslalionary  process  estimation  is  via 
state-space  models  It  is  assumed  that  the  signal  process  2(  can  be  described 
as 

Zt  =  HiX,  (3Ba) 

Xj  =  F|i|  +  GfUf  ,  1  >  to  .  (3Bb) 

where  ij  is  an  nxl  "state"  vector.  x<  is  a  pxl  vector  and  u,  is  an  mxl 
"white  noise"  vector.  The  observed  process  is 

Vt  =  *1  +  12,  (3Bc) 

where  is  the  observation  noise  v,  such  that^ 

[“.".])=  [4 

The  initial  slate  x,^  is  assumed  to  be  reuidom,  with 

£■  x,jp  =  0  ,  E  x,gxi^  =  Ho  .  (3ee) 

and  furthermore,  it  is  also  assumed  that 

E  =  0  ,  E  Vfxi^  =0  for  oil  1  >  lo  (30f) 

Finally,  the  matrices  Fl,Ci,//,,no  are  all  assumed  to  be  known  The  above 
assumptions  ensure  that 

£■  12, 2^  =  0  if  s  <  1 


They  also  ensure  that  x,  is  a  Markov  process,  so  that  the  signal  z,  is  modeled 

In  the  literature  it  is  corrmon  to  assume  E 'Uf'U^  =  Qi6{t —s)  and  12, 12,  = /f,  (5(f  — .s), 

/?(  >0,  but  without  loss  of  generality  we  can  make  the  convenient  normal '.zatioTis  of  (hBd) 
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(IS  a  so-called  pn^sction  of  the  Markov  state  process  Xj.  Such  descriptions  had 
been  widely  used  by  physicists  (see,  e  g  .  the  papers  in  Wax  (1954))  and 
mathematicians  (see.  e  g.,  Doob  (1944).  (1948))  for  stationary  processes  with 
rational  power  spectral  densities.  The  extension  to  models  with  time-variant 
coefficients,  as  in  (38)  above,  is  at  least  in  retrospect  fairly  natural  and  it  began 
to  be  made  in  estimation  theory  in  the  late  fifties  by  Laning  and  Battin  (1956.  p 
304),  Stratonovich  (1959),  (1960),  and  most  notably  by  Kalman  (1960)  and  Kal¬ 
man  and  Bucy  (1961) 

The  celebrated  Kalman-Bucy  filter  for  the  model  (36)  is 

*•  =  Htx,.  t  >  to  (39a) 

X  =  FtXf  +  A)i/i  (39b) 

^1,  =  0  (39c) 

Here  u,  is  the  '  innovation  process  '. 

i-i  =  V«  -  (39d) 

which  IS  known  to  be  while  with 

£■  =  /  d(f  -  s)  (39e) 

The  nxp  matrix  Kf,  which  we  shall  call  the  "Kalman  gam",  can  be  computed  as 

K,  =  f\  H,  +  Q  C,  (40a) 

where  the  nxn  matrix  Pi  is  the  covariance  mat  nx  of  the  errors. 

Pi  -  K  x,r,  .X,  -  X,  -  z ,  .  (40b) 

Kalman  and  Bucy  showed  that  I'l  could  be  recui  .sively  determined  as  the 
unique  solution  of  the  nonlinear  Bici  alr  type  difTerential  equation 

Pi  =  PiP,  +  Pi  Pi  ♦  (ii(i  -  hi  Hi  .  Pi^  =  :io  (40c) 
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This  is  all  very  well  known  to  eslunation  theorists  by  now.  It  is  perhaps  not 
so  widely  known  that  the  recognition  of  the  importance  of  state-space  models 
and  Markov  processes  in  signal  estimation  problems  is  due  independently  to 
Stratonovich,  who  actually  studied  the  nonlinear  filtering  problem,  and,  using 
"Gaussian  approximation"  methods,  derived  what  was  later  called  the  "extended 
Kalman  filter”.  For  the  linear  case,  Stratonovich  gave  an  exact  solution  which  is 
exactly  the  Kalman-Bucy  filter.  However,  no  stability  einalysis  was  given  and  no 
intensive  study  of  the  Riccati  equation  was  made;  these  were  the  vital  contribu¬ 
tions  of  Kalman  and  Bucy 

It  is  often  said  that  the  reason  for  the  wide  applicability  of  the  Kalman-Bucy 
filter  (in  particular,  to  general  time-variant  models)  is  that  giving  a  slate-space 
model  avoids  the  difficult  problem  of  "spectral  factorization".  This  is  totally 
wrong!  Canonical  spectral,  or  in  the  nonslalionary  case  "covariance",  factoriza¬ 
tion.  results  in  a  model  that  is  not  only  causal  but  also  causally  invertible.  This 
is  clearly  not  the  case  for  the  assumed  model  (38)  knowledge  of  y=\y,.  J 

does  not  allow  us  to  reconstruct  u,v  and  Xf^.  Therefore  to  find  the  filter, 
canonical  factorization  still  has  to  be  done  in  one  way  or  einother 

In  fact,  as  we  noted  in  Section  2c  (cf.  (14)),  knowledge  of  the  canonical 
spectral  factor  should  immediately  determine  the  filter  and  vice  versa.  Here  we 
have  the  filter  and  could  ask  how  to  obtain  the  canonical  factor  The  answer  is 
simple:  just  rewrite  (39)  in  the  form 

Xi  =  FtXi  +  K,Vi  (41) 

Vt  =  Htx,  +  V,  .  xt^  =  0  . 

We  see  that  (41)  describes  a  causal  and  causally  invertible  relation  between  a 
white  noise  input  v  and  the  desired  process  y  We  call  it  the  innovations 
representation  (IR)  of  y  Thus  the  above  equations  determine  the  true 
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canonical  factor.  But  to  find  this  factor,  we  have  to  do  quite  a  bit  of  work,  viz  , 
solve  the  Riccati  equation.  In  other  words,  assuming  a  state-space  model  does 
not  in  any  sense  allow  us  to  avoid  determining  the  canonical  filter  (unless,  of 
course,  we  start  with  such  a  model). 

A  Remark  on  Derivations 

By  now  numerous  proofs  of  the  Kalman-Bucy  equations  are  available  In  the 
present  context,  it  may  be  interesting  to  note  that  if 

( 

X,  =  /A,(f  ,s)v,  ds 
*0 

then  h^{t,s)  obeys  the  Wiener-Hopf  equation 

t 

+  f  h^{t.T)K(T,s)  dr  =  K{t  ,s)  , 

0 

where  K{t  ,s)  can  be  readily  computed  from  the  given  model  (3B)-in  fact,  see 
(44)  below.  Some  calculation  will  then  Show  that 

SO  that 

rf  -  r 

=  f  [F{t)  -h^{t.t)H{t)]h^{t.s)y.  ds  +  fi^(f,f)y,y 

CM  Q 

=  /’(f)z,  +  h,y(f,t)[y,  -  //(f  )i',]  , 

which  will  be  the  Kalman-Bucy  equation  (39a)  when  we  show  that  h^{t  ,t)=Kt  as 
given  by  (40)  We  shall  omit  these  calculations 

3b.  Slate  ^Mce  Covariance  Models  -  Recursive  Wiener  Filters 

While  many  different  models  ,//<  .Floj  cein  yield  the  same  covari¬ 

ance  function 


R{t.s)  =  /  d(f  -s)  +  K{t.s)  , 
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the  impulse  response  of  the  canonical  factor  is  known  to  be  uniquely  determined 
by  R{t.s).  Therefore  one  would  expect  that  it  is  possible  to  compute  the  Kal¬ 
man  gain  Kt  (which  should  not  be  confused  with  the  term  K{t  ,s)  in  the  covari- 
2uice  funtion)  directly  from  the  parameters  of  the  covariance  function  /f(f,s). 
This  can  in  fact  be  done  (Kailath  and  Geesey.  1971).  as  can  be  seen  by  examining 
the  special  form  of  the  covariance  function  associated  with  a  state-space  model 
of  the  form  (38). 

Let  i(t,s)  be  the  state  transition  matrix,  which  is  the  unique  solution  of 
the  linear  differential  equation 

— =  F,m.s)  ,  4>(s,s)  =  / 

Let  n<  be  the  n  xn  covariance  matrix  of  z, , 

n,  =  £■  z,z,  .  (43) 

It  is  easy  to  check  that  Hi  is  the  solution  to  the  Lyapunov-type  equation 

n*  =  Hi  +  rij  +  Gj  G(  ,  ( 43) 

with  given  initial  value  Flo  By  direct  calculation  we  have 

f/  ■  6(f  -s) -1- //,4>(<.s)A',  iffs-s 
"  (/  (5(f  -s) -I- if/ss 

where 

N,  =  n,  fit  +  QQ  (43) 

We  will  now  assume  that  (only)  fft, /•}  and  Nt  are  known  Define 

:=  ex,x,  (46) 

If  we  recall  that  Pt  =E  x,x  i.  where  z,=z, -Z(,  then  the  orthogonality  of  Z( 
and  Zj  immediately  yields  the  identity 

n,  =  /",  +  S,  (47) 
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Therefore  we  can  now  rewrite  the  Kalman  gain  (cf  (39f))  as 

Ai  =  Pi  Hi  +  Q  Q  =  II,  Hi  CkCi  Hi  =  Ni  -Y.I  Hi  (46) 

Moreover,  note  that 

=  n,  -  P,  =  P,i;,  +  THiFi  +  KiKi  ;  i:,,  =  0  (49) 

The  equations,  (40)  and  (49),  determine  AT,  completely  in  terms  of  the  parame¬ 
ters  \Hi,Fi,Ni\  of  the  covariance  function  R{t,s) 

We  note  that  we  have  effectively  also  obtained  a  recursive  form  of  the  solu¬ 
tion  to  the  Wiener-Hopf  integral  equation  (6)  for  estimating  the  signal  z,  from 
y, ,  provided  the  covariance  function  R{l.s)  is  given  in  the  (factored)  form  (44) 
This  provides  a  nice  answer  to  the  efforts  of  several  investigators  in  the  mid- 
1950's  attacking  Wic.ner-Hopf  equations  with  nonslationary  kernels  (see,  e  g  , 
Dolph  and  Woodbury  (19S2),  Shinbrol  (1956),  Zadeh  and  Miller  (1956),  l,aning  and 
Baltin  (1956))  We  may  thus  call  our  solution  (39),  (4B)-(49)  a  recursive  Wiener 
filter  as  compared  to  the  Kalman-Bucy  filter  (39)-(40) 

The  close  connection  we  noted  in  Sec  Ic  between  the  solution  of  the  filter¬ 
ing  problem  and  of  the  canonical  factorization  problem  means  that  the  above 
results,  especially  (41)  and  (48)-(49),  also  give  us  an  expression  of  the  canonical 
factor  (innovations  representation)  directly  in  terms  of  the  peiramelers  of  the 
covariance  function  The  notation  x  for  the  stale  of  the  model  (40)  is,  of 
course,  just  a  reflection  of  the  original  noncanonical  model  (38)  that  we  started 
with,  X  has  no  particular  significance  if  we  are  only  given  a  process  y  with 
covariance  R  =  l  +  K.  and  therefore  it  is  preferable  to  rewrite  (40)  m  a  different 
notation  following  Faurre  (1969),  we  shall  write  the  canonical  innovations 


model  as 


X,  =  A’(X,  +  ,  X.0  =  0 

Vi  =  +  I's 


(50a) 

(50b) 
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where 

K^=N,-'S.^Hi  (50c) 

£ind  T,  •  obeys  the  Riccall  equation 

L’<j  =  +  S<i/q  +  Kf  Ki  S *0  “  0  (50d) 

The  value  of  our  particular  derivation  of  the  IR  is  that  it  shows  that  of  all 
(causal)  models  of  a  given  covariance  triple  [f't.Ht.Ntl.  this  one  has  the  "smal¬ 
lest"  state  variance  matrix,  because  its  state,  z., ,  is  the  projection  (on  the 
space  spanned  by  \y,.  s<t  j)  of  the  state,  z, .  of  emy  other  model 

There  has  been  some  interest  in  studying  the  set  of  all  causa)  models  asso¬ 
ciated  with  a  given  covariance  triple,  in  particular,  one  might  ask  by  analogy 
with  the  above  for  a  maximurn-^ariance  causal  model.  It  turns  out  that  this 
model  can  be  defined  via  a  certain  smoothing  problem,  rather  than  a  filtering 
problem  as  for  the  minimum-variance  model  We  refer  to  Kailath  and  Ljung 
(1981)  for  a  discussion  of  this  result,  and  its  implications  for  the  so-called  sto¬ 
chastic  realization  problem  of  studying  all  causal  models  of  a  given  process 
There  have  been  a  number  of  recent  papers  on  this,  see  especially,  the  book  of 
Clerget,  Faurre,  Germain  (1978),  the  thesis  of  Ruckebusch  (1979),  and  Lindquist 
and  Ricci  (1981),  which  contains  references  to  several  of  their  earlier  works. 

3c.  Orthogonal  Decomposition  of  the  Space  of  Random  Variables 

Here  we  shall  continue  instead  with  another  aspect  of  the  interplay  between 
smoothing  and  filtering  theory,  as  brought  out  by  a  recent  result  of  Weinert  and 
Desai  (1980),  which  we  can  formulate  as  follows 

Given  the  state-space  model  (38),  it  is  easy  to  see  that  we  cannot  recover 
the  'input"  random  variables  just  from  knowledge  of  the  "output’ 

random  variables  ii/jj,  unless  we  have  additional  information  The  nice  observe- 
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tion  of  Weinert  and  Desai  is  that  this  additional  information  can  be  provided  by  a 
model  'adjoint’  to  (36)  in  a  certain  sense  In  particular,  define  jr?,  ,0j  by  the 
equations 


Xt  =  -  F'lXi  -  Hivt  .  Xt,  -0  (51a) 

Vt  =  -  GiXt 
and 

9  =  -  rioXio  +  (51c) 

Then  we  can  check  by  direct  calculation  that 

E  Qyi  ^  0  .  E  Xty'i  0  (52) 

and  that  the  sets 

and  \yt.r)t.®l  (53) 


can  each  be  recovered  from  the  other  by  causal  linear  operations  In  fact,  this 
latter  equivalence  can  be  seen  by  first  combining  the  equations  (36)  and  (51) 
into  the  set 


><  0 

hi 

E  h  0 

'h 

j 

0  -/■, 

r 

1 

hi 

1° 

1  p 

Vt 

>■< 

-C, 

y< 

0 

0 

Xt 

■L'l 

Then,  by  elimination,  we  con  write 


I 

-Cn  C’n 

h 

[  f^i^i 

{-fltHt 

-Ft 

h 

{~i!,yi 

with  the  "two-pomt  "  boundary  value  conditions 


=  0  .  l  loXi,  =  0 


(5'la) 


(54b) 


(55a) 


(55b) 


Given  j0,Vi ■'?(!.  can  solve  this  linear  two-point  boundary  value  problem  for 
(Z(  ,X(  i  and  then  calculate  Jiq  ,V|  .i(0)!  from  (51b. c) 


J 
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We  can  summarize  the  above  discussion  by  saying  that  the  result  of  Weinerl 
and  Desai  shows  that  the  lineau'  space  of  random  variables  spanned  by  jit(  .v,  .zoj 
can,  at  each  / ,  be  orthogonally  decomposed  into  the  space  spanned  by  [yt  j 
and  (rjj.Sj 

[The  following  remarks  might  help  explain  the  origin  of  the  above  equations 
The  point  is  that  given  )V(.  we  cannot  reconstruct  at 

best  only  their  smoothed  estimates  For  a  full  reconstruction  we 

must  augment  jy,  j  by  the  errors  \ut.v,.x  oj  Now  some  simple  calculation, 
which  is  facilitated  and  in  fact  illuminated  by  the  use  of  operator  notation,  will 
show  that  the  [ry  ,9j  as  defined  above  span  the  same  space  as  the  \u  ,.v  t  .x  ol 
Hence .  ] 

Of  the  several  interesting  implications  of  the  above  results,  we  shall  here 
focus  on  just  one. 

3iL  The  Hsuniltonian  Equations  and  a  Scattering  fVaroework 
for  Estimation  Theory 

Consider  the  projection  of  the  random  variables  in  the  equations  (55)  onto 
the  space  spanned  by  \yt. 

Now,  if  we  define 

=  Xdif  (56) 

and  note  that  the  orthogonality  properties  (52)  imply  that  and 

the  equations  (55)  reduce  to 

*i|i/ 

with  boundary  conditions 

=  0  , 


F,  -c,  d 

0 

HtHt  Ft 

H, 

Vt 


(57a) 


(57b) 
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These  are  the  so-cedled  Hamiltonian  equations  for  linear  estimation  They 
were  first  obtained  by  Bryson  and  Frazier  (1963)  in  exploring  a  calculus  of  varia¬ 
tions  approach  (maximizing  a  'likelihood'  function)  to  least -squares  estimation. 
Verghese.  Friedlander  and  Kailath  (19B0)  used  them  to  develop  a  so-called 
scattering  theory  approach  to  linear  estimation  To  introduce  this,  we  shall  for 
convenience  change  notation  in  (57): 

to  -•  T.  f  »  s.  tj  *  t  . 

Then  using  Euler  discretization,  e  g  . 

^  II  ~  [^+A|i  A  0  (A) 


and  the  approximclion 


f  y„  da  =  3/, A  +  o(A)  . 

* 

we  can  obtain  the  following  discrete  approximation  to  (57); 


•■AK  1  _ 

I+F.H 

C.Q,'A 

*.li 

0 

A,  u  j 

< 

1 

/+F,A 

A««ai< 

(56) 


where  we  have,  and  shall  in  the  future,  consistently  omit  all  o  (A)  terms  Note 
the  arguments  of  the  X  |( ,  which  are  reversed  from  those  of  i  n  Because  of 
this,  we  can  depict  (58)  graphically  as  in  Fig.  1,  which  suggests  that  we  can 
regard  x(  \t)  as  a  forward  wave  and  X(  |<)  as  a  backward  wave  travelling 
through  an  incremental  section  at  s  of  some  scattering  medium  specified  by 
the  quantities 

/  +  F,h  =  the  incremental  forward  transmission  coefficient 
/  +  F,A  =  the  incremental  backward  transmission  coefficient 
-H,H,^  =  the  incremental  left  reflection  coefficient 
G,  fi'A  =  the  incremental  right  reflection  coefficient 


ngure  2.  A  macroscopic  scattering  section 


A 
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We  can  pul  together  such  incremental  sections  to  get  a  macroscopic  sec¬ 
tion  of  the  scattering  medium  from  say  s=t  to  s=t.  This  is  shown  in  Fig.  2, 
where,  for  reasons  that  will  be  clear  very  soon,  we  have  denoted 
♦o(^.t)  =  the  /oraard  Iramsmission  operator 
♦o(f  ,t)  =  the  backward  transmission  operator 
Pa(t,T)  =  the  right  reflection  operator 
Oo(t,T)=  the  ie/f  reflection  operator 

go(/,T)=  the /onmrd  internal  source  (ie,  y{))  contribution 
9o  .t)  =  the  backward  internal  source  contribution 


The  reasons  for  this  notation  can  be  seen  by  considering  the  effect  of  adding  an 
incremental  section  from  t  to  f+A,  as  shown  in  Fig  3  By  tracing  paths 
through  the  figure,  we  can  write 

*o(<  +  A.r)  =  (/  +  -  (/  +A’A)PoW'WA$o(f  .t) 

+  (/  -I-  - 

=  (/ ■E/’A)(/  -  +  o(A)4'o{f  .t)  (59) 

where  o(A)  denotes  terms  that  go  to  zero  faster  than  A  as  A-»0  Then  we  see 
that 

hm  ~ =  {F  -  PoH'H)Mt  t)  (60) 

which  identifies  as  the  state-lremsilion  matrix  of  F-P^H'H 

Similarly,  we  can  see  by  tracing  paths  through  Figure  3  that 

Fo(<  +  A,t)  =  GG'A  +  (/  +  rA)(Po  -  Pof^'HPo^  +  o  (A))(/  +  TA)  (61) 

so  that 


Urn 


Po{t  *  t^.T)  -  Po(f.T) 
A 


=  CG  +  FPo  +  PoF'  -  PoH' HP 


(62) 


338 


T.  KAILATH 


which  identifies  Po(f  .t)  as  the  solution  of  a  Rlccati  differential  equation.  A 
similar  calculation  will  identify  Oo(f.T)  as  an  observability  Gramian  of  the 
matrices  H ,H\ 


ngure  3.  To  determiae  the  (forward)  evolution  equation  Of5o(/.T) 


We  shall  collect  these  operators  in  a  so-called  scattering  matTix 


ft  ■r'i- 

SoU.T)-  _o,(t.T)  $0(1  ,t) 


(63) 


and  our  previous  calculations  show  that 


^So(f.T)  = 


\(F-PoH'H)io  FPt+PoF'+CXf-Pofi’HPo 
*'o{F-PoH'H)' 


(64a) 
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With  initial  conditions 

•5>’o(t,t)  =  5]  (64  b) 

These  initial  conditions  help  to  explain  the  subscript  O'  on  the  various  quanti¬ 
ties  above,  and  we  can  gel  some  more  insight  into  this  by  going  back  to  the 
boundary  conditions  of  the  original  Hamiltonian  equations  and  trying  to  incor¬ 
porate  them  into  our  scattering  picture  The  \i|(=0  condition  means  we  have 
no  backwards  incoming  wave,  while  the  condition 

*T|I  =  noArj^ 

can  be  incorporated,  as  shown  in  Fig  4,  by  adding  a  boundary'  layer  to  the  left 
of  the  section  Sb(1,t)  of  Fig  2. 

One  immediate  result  from  this  figure  is  that  we  can  identify  jijo  .?o  i  as 
the  emerging  ruaves  from  the  medium  when  llo=0  We  shall  denote  those  as 

go'(^T)  =  MtIO  .  go  (' .t)  =  io{MO  (65) 


I  A(T|t)  x(t|t)=0 

figure  4.  Incorporating  the  boundary  conditions. 
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We  shall  now  show  how  to  derive  forward  differential  equations  for  X© 

Xo  and  also  backward  equations  as  in  (i7)-(i0)  for  forward  equations,  we  start 
by  adding  an  incremental  section  to  the  one  in  Fig,  4  (but  with  n=0=j:o).  Doing 
this  gives  the  result  shown  in  Fig.  5b. 

Combining  the  signals  in  the  two  pcu"ts  of  Fig  5a  we  obtain  for  the  quantities 
in  the  combined  section  of  Fig.  5b.  Ine  relations 

X(i(f  |t  +  A)  =  Xo(7lO  +  ~  Hxoit  |f  +  A)A  +  o(A) 

so  that 

(66) 

So  also,  we  can  read  off  the  relations 

*o(<  +  A|f  +  A)  =  (/  +  FA)io(f  1/  +  A) 

£o{t  U  +  A)  =  £o(t  If)  +  PoH'iy  -  H£oit  |f  +  A))  +  o(A) 

so  that 

£o{t  If)  =  F  £o{t  |f  )  +  PcH'iyU)  -  H{t)£o{t  |f ))  (67) 

which  can  immediately  be  recognized  as  the  Kalman-Bucy  equation,  thus 
explaining  the  notation  in  (65).  What  we  wished  to  illustrate  here  is  that  the 
state-space  filtering  equations  can  be  derived  from  the  (scattering  framework 
based  on  the)  Hamiltononian  equations  for  the  smoothed  estimates  which,  of 
course,  is  the  opposite  of  the  usual  order. 

There  are  many  other  illuminating  consequences  of  the  scattering  picture, 
which  not  only  helps  to  organize  in  a  very  efficient  way  many  of  the  special 
results  and  identities  associated  with  state-space  estimation,  and  has  led  to  new 
results  on  backwards  Sfarkovian  models,  smoothing  formulas,  asymptotic  pro¬ 
perties.  fast  algorithms,  decentralized  estimation,  etc.  However,  we  shall 


(b)  The  combined  aecUon  riiowing  the  resulting 
signals  at  diflerenl  points 
ngureb.  Determining  the  forward  evolution 


IL 
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3e.  Nonstationafy  Processes  Studied  as  Processes  Close  to  Stationary 

In  studying  the  Ambartzumian-Chandrasekhar  equations  of  Sec  8b  and 
especially  their  extension  to  nonstationary  processes  generated  by  constant 
coefficient  state-space  models  (Kailath  (1973))  in  a  scattering  framework. 
Kailath  and  Ljung  (1975)  noticed  that  the  Sobolev  and  Krein-l,evinson  equations 
of  Sec  Sc  could  be  extended  to  nonstalionary  processes  by  using  the  concept  of 
an  index  of  nonstationarity  of  a  kernel  or  process  We  shall  briefly  outline  the 
main  idea  here 

We  first  define  a  so-called  displacement  operator  _!  by 


and  we  note  that  if  A'(f,s)  =  K(t-s)  (and  is  differentiable),  then 


(68) 


_]/(■=  0 


For  a  non-stationary  process  we  may  have 

_!A'(f,s)  =  .  S  <  »  (69) 

1=1 

for  some  smallest  o,  and  a  family  of  functions  (^i(f)!  (For  simplicity,  we  have 
assumedp  =  l,  i.e  ,  scalar  processes  and  scalar  kernels  ] 


Bumple  t. 

For  the  Wiener  process  K{t  .■^)=Tnin{t  .s)  and  a  =  l 


Example  2. 

For  a  state-space  model  (30)  with  consteuit  parameters,  it  can  be  shown 
that  a<n,  the  dimension  of  the  state  space 
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Example  3. 

For  a  composition  of  kernels 

r 

(/Ti/faXf.s)  =  J  Ki(t,u)Ki(u.s)(iu. 

0 

we  obtain  the  rule 

-I  =  (_|  Kt)K2  +  A-jUI  Ifz)  +  K,6aKi  -  KtdfKz  (70) 

where 

T 

Ki6gKz  =  f  K(t.u)S{ii  -  g)K2{u.s)  du  =  Ki(t  .g)Kz{g  .s) 

0 

The  composition  rule  gives  us  an  easy  derivation  of  the  Sobolev  identity  For 
this,  we  apply  the  displacement  operator  to  the  equation  H+HK=K  and  use  the 
rule  (70)  to  obtain,  after  some  rearrangement,  the  result 

(_1W)(/  +  /(')  =  (/-  W)_|A-  +  H{6t  -  6o)K 
Noting  that  (/+/<■)  '=/-//  and  K{l+K)''=H  we  then  obtain 

^\H  ~{f  ~JH)(-\K){1  -  H)  ^  H{Sr  -So)H  (71) 

In  the  stationary  case  _|A'=0  and  we  have 

-IH  =  HdfH  -  H6aH 

or  written  out 

_l/f(f,s;r)  = //(f  ,r.r)ff(r.s;7')  - //(f,o,r)/f(o,s:7’)  .  (72) 

which  is  just  the  Sobolev  identity  (29)  of  Sec.  2c. 

The  interesting  fact  is  that  this  identity  can  be  extended  to  nonstationary 
processes  by  using  a  slight  modification  of  the  representation  (69)  l.et  us 
rewrite  (69)  in  the  form 


_| AT  =  K{t.Ci)K{0.s)  +  £  Aj£4(0<4'(s)  =  f<6oK  4-  D,M),  .  say 


(73) 
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Substituting  into  (71)  we  will  have 

_!//  =  (/-  H){K&oK  +  DiKD.)U  -  //)  +  H6tH  -  H&aH 
=  H6aH  +  (/  -  H)Di\D,(l  -  W)  +  H6tH  -  H6cH 

=  H6tH  +  CikC,  ,  say  (74) 

where  C  =  D(I -H),  or  equivalently.  C  satisfies  the  integral  equation 

C(1  +  K)  =  D 

Note  that  C  will  be  a  Ixa  vector  of  functions,  with  a  =  l  in  the  stationary  case 
Thus  a  can  serve  as  a  measure  of  nonstationarity  of  the  process,  and  (74)  is  the 
corresponding  Sobolev  identity  for  nonstationary  processes  By  simple  further 
calculations  we  can  also  obtain  generalized  versions  of  the  Krein-Levinson  equa¬ 
tions  of  Sec.  2c 

We  refer  to  Kailath,  Ljung,  Morf  (1976),  (1978)  for  discussions  of  the  compu¬ 
tational  aspects  of  these  equations  and  of  the  role  of  the  parameter  a  We 
regret  that  reasons  of  space  and  time  do  not  permit  us  to  describe  here  results 
on  some  efficient,  so-ceJled  ladder-form,  implementations  of  these  equations. 
These  results  allow  us  to  carry  over  to  processes  without  (finite-dimensional) 
state-space  models,  the  basic  computational  advantages  of  the  state-space 
assumption  These  were  briefly  mentioned  in  the  conference  lectures;  for  more 
details  we  refer  to  the  theses  of  D.  T  L  Lee  (1980)  and  H  Lev-Ari  (1981) 

4.  A  Concluding;  Remeu-k 

In  the  nonlinear  filtering  problem,  the  state-space  assumption  has  by  no 
means  been  as  useful  as  in  the  linear  case,  since  it  leads  to  difficult  nonlinear 
stochastic  partial  differential  equations.  It  may  be  that  return  to  an  input- 
output  formulation,  perhaps  based  on  the  Wiener-Volterra  reprsentalion,  can  be 
combined  with  analysis  along  the  lines  of  Secs  2c  and  8c  to  make  some  compu¬ 
tational  progress  in  the  nonlinear  filtering  problem 
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